Automatic Vocabulary Adaptation Based on Semantic Similarity and Speech Recognition Confidence Measure
نویسندگان
چکیده
Out-Of-Vocabulary (OOV) word utterances are unavoidable in speech recognition since the vocabulary size of a recognition dictionary is limited. And therefore, automatic vocabulary adaptation, which selects unregistered (i.e. OOV) words from relevant documents and registers them to a dictionary with their proper probability values, is an important technique. To improve recognition accuracy, a vocabulary adaptation method is required to register only relevant words that will actually be spoken in target utterances and not to register words that will not be spoken (i.e. redundant word entries). In this paper, we propose a novel automatic vocabulary adaptation method that satisfies these requirements based on semantic and acoustic similarities. Acoustic similarity is represented in speech recognition confidence measure. Experiments show that, with our method, the word selection accuracy is improved twice and the recognition accuracy focused on newly registered words is improved 15.1% in F-measure, compared with conventional methods.
منابع مشابه
Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection
In this paper, we propose a novel confidence measure to improve the performance of spoken term detection (STD). The proposed confidence measure is based on the context consistency between a hypothesized word and its context in a word lattice. The main contribution of this paper is to compute the context consistency by considering the uncertainty in the results of speech recognition and the effe...
متن کاملNew Adaptation Techniques for Large Vocabulary Continuous Speech Recognition
This paper proposes several new speaker adaptation techniques to improve the large vocabulary continuous speech recognition accuracy. These include, discriminative adaptation, state-quality measure based adaptation, and N-best hypothesis based adaptation schemes. We propose to incorporate the MMIE criterion in the computation of the posterior counts from the adaptation data. We present a new me...
متن کاملExploring Content Features for Automated Speech Scoring
Most previous research on automated speech scoring has focused on restricted, predictable speech. For automated scoring of unrestricted spontaneous speech, speech proficiency has been evaluated primarily on aspects of pronunciation, fluency, vocabulary and language usage but not on aspects of content and topicality. In this paper, we explore features representing the accuracy of the content of ...
متن کاملDistractor Generation for Chinese Fill-in-the-blank Items
This paper reports the first study on automatic generation of distractors for fill-inthe-blank items for learning Chinese vocabulary. We investigate the quality of distractors generated by a number of criteria, including part-of-speech, difficulty level, spelling, word co-occurrence and semantic similarity. Evaluations show that a semantic similarity measure, based on the word2vec model, yields...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کامل